Goto

Collaborating Authors

 ai red team


From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming

Sinha, Anusha, Grimes, Keltin, Lucassen, James, Feffer, Michael, VanHoudnos, Nathan, Wu, Zhiwei Steven, Heidari, Hoda

arXiv.org Artificial Intelligence

A red team simulates adversary attacks to help defenders find effective strategies to defend their systems in a real-world operational setting. As more enterprise systems adopt AI, red-teaming will need to evolve to address the unique vulnerabilities and risks posed by AI systems. We take the position that AI systems can be more effectively red-teamed if AI red-teaming is recognized as a domain-specific evolution of cyber red-teaming. Specifically, we argue that existing Cyber Red Teams who adopt this framing will be able to better evaluate systems with AI components by recognizing that AI poses new risks, has new failure modes to exploit, and often contains unpatchable bugs that re-prioritize disclosure and mitigation strategies. Similarly, adopting a cybersecurity framing will allow existing AI Red Teams to leverage a well-tested structure to emulate realistic adversaries, promote mutual accountability with formal rules of engagement, and provide a pattern to mature the tooling necessary for repeatable, scalable engagements. In these ways, the merging of AI and Cyber Red Teams will create a robust security ecosystem and best position the community to adapt to the rapidly changing threat landscape.


Lessons From Red Teaming 100 Generative AI Products

Bullwinkel, Blake, Minnich, Amanda, Chawla, Shiven, Lopez, Gary, Pouliot, Martin, Maxwell, Whitney, de Gruyter, Joris, Pratt, Katherine, Qi, Saphir, Chikanov, Nina, Lutz, Roman, Dheekonda, Raja Sekhar Rao, Jagdagdorj, Bolor-Erdene, Kim, Eugenia, Song, Justin, Hines, Keegan, Jones, Daniel, Severi, Giorgio, Lundeen, Richard, Vaughan, Sam, Westerhoff, Victoria, Bryan, Pete, Kumar, Ram Shankar Siva, Zunger, Yonatan, Kawaguchi, Chang, Russinovich, Mark

arXiv.org Artificial Intelligence

In recent years, AI red teaming has emerged as a practice for probing the safety and security of generative AI systems. Due to the nascency of the field, there are many open questions about how red teaming operations should be conducted. Based on our experience red teaming over 100 generative AI products at Microsoft, we present our internal threat model ontology and eight main lessons we have learned: 1. Understand what the system can do and where it is applied 2. You don't have to compute gradients to break an AI system 3. AI red teaming is not safety benchmarking 4. Automation can help cover more of the risk landscape 5. The human element of AI red teaming is crucial 6. Responsible AI harms are pervasive but difficult to measure 7. LLMs amplify existing security risks and introduce new ones 8. The work of securing AI systems will never be complete By sharing these insights alongside case studies from our operations, we offer practical recommendations aimed at aligning red teaming efforts with real world risks. We also highlight aspects of AI red teaming that we believe are often misunderstood and discuss open questions for the field to consider.


Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle

Haider, Emman, Perez-Becker, Daniel, Portet, Thomas, Madan, Piyush, Garg, Amit, Ashfaq, Atabak, Majercak, David, Wen, Wen, Kim, Dongwoo, Yang, Ziyi, Zhang, Jianwen, Sharma, Hiteshi, Bullwinkel, Blake, Pouliot, Martin, Minnich, Amanda, Chawla, Shiven, Herrera, Solianna, Warreth, Shahed, Engler, Maggie, Lopez, Gary, Chikanov, Nina, Dheekonda, Raja Sekhar Rao, Jagdagdorj, Bolor-Erdene, Lutz, Roman, Lundeen, Richard, Westerhoff, Tori, Bryan, Pete, Seifert, Christian, Kumar, Ram Shankar Siva, Berkley, Andrew, Kessler, Alex

arXiv.org Artificial Intelligence

Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone. As these models are deployed in an increasing number of domains, it is critical to ensure that they are aligned with human preferences and safety considerations. In this report, we present our methodology for safety aligning the Phi-3 series of language models. We utilized a "break-fix" cycle, performing multiple rounds of dataset curation, safety post-training, benchmarking, red teaming, and vulnerability identification to cover a variety of harm areas in both single and multi-turn scenarios. Our results indicate that this approach iteratively improved the performance of the Phi-3 models across a wide range of responsible AI benchmarks. Finally, we include additional red teaming strategies and evaluations that were used to test the safety behavior of Phi-3.5-mini and Phi-3.5-MoE, which were optimized for multilingual capabilities.


Microsoft's AI Red Team Has Already Made the Case for Itself

WIRED

For most people, the idea of using artificial intelligence tools in daily life--or even just messing around with them--has only become mainstream in recent months, with new releases of generative AI tools from a slew of big tech companies and startups, like OpenAI's ChatGPT and Google's Bard. But behind the scenes, the technology has been proliferating for years, along with questions about how best to evaluate and secure these new AI systems. On Monday, Microsoft is revealing details about the team within the company that since 2018 has been tasked with figuring out how to attack AI platforms to reveal their weaknesses. In the five years since its formation, Microsoft's AI red team has grown from what was essentially an experiment into a full interdisciplinary team of machine learning experts, cybersecurity researchers, and even social engineers. The group works to communicate its findings within Microsoft and across the tech industry using the traditional parlance of digital security, so the ideas will be accessible rather than requiring specialized AI knowledge that many people and organizations don't yet have.


Vulnerabilities May Slow Air Force's Adoption of Artificial Intelligence

#artificialintelligence

The Air Force needs to better prepare to defend AI programs and algorithms from adversaries that may seek to corrupt training data, the service's deputy chief of staff for intelligence, surveillance, reconnaissance and cyber effects said Wednesday. "There's an assumption that once we develop the AI, we have the algorithm, we have the training data, it's giving us whatever it is we want it to do, that there's no risk. There's no threat," said Lt. Gen. Mary F. O'Brien, the Air Force's deputy chief of staff for intelligence, surveillance, reconnaissance and cyber effects operations. That assumption could be costly to future operations. Speaking at the Air Force Association's Air, Space and Cyber conference, O'Brien said that while deployed AI is still in its infancy, the Air Force should prepare for the possibility of adversaries using the service's own tools against the United States.